R is a software environment for statistical computing and graphics. Using R you can do rigorous statistical analysis, clean and manipulate data, and create publication-quality graphics.
clustering map
Packages are programs that you import into R to help make tasks easier. The most popular R packages for working with data include dplyr, stringr, tidyr, and ggplot2.
There’s no easy way (yet) for new R users to find R packages that they might need. People are working on this problem. In the meantime, consult the following list or ask a Librarian!
Resources include:
You can create graphs in R without installing a package, but packages will allow you to create better visualizations that are any of the following:
ggplot2 is the most popular visualization package for R. It’s the best all-purpose package for creating many types of 2-dimensional visualizations.
highcharter is an R package known as an htmlwidget, which allows you to use popular Javascript packages for visualization in R. Free unless you are using it for a commercial or government purpose.
data(citytemp)
hc <- highchart() %>%
hc_xAxis(categories = citytemp$month) %>%
hc_add_series(name = "Tokyo", data = citytemp$tokyo) %>%
hc_add_series(name = "London", data = citytemp$london) %>%
hc_add_series(name = "Other city",
data = (citytemp$tokyo + citytemp$london)/2)
hc
m <- leaflet(options = leafletOptions(zoomControl = FALSE, dragging=FALSE, minZoom = 15, maxZoom = 15)) %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=-78.6697, lat=35.7876,
popup="Hello World!")
m # Print the map
p <- plot_ly(economics, x = ~date, y = ~unemploy / pop)
p
ggplot2 was created on the principles of the Layered Grammar of Graphics (2010), by Hadley Wickham and based of off work from Wilkinson, Anand, & Grossman (2005) and Jaques Bertin (1983).
Essentially: graphs are like sentences you can construct, and they have a grammar. The grammar of graphics consists of the following:
at least one layer:
plus the following: * scale
* coordinate system
* facet (optional)
These components make up a graph.
Download the following file: script.R Click the blue download button
Open RStudio. File > Open File…
Select the script.R file that you just downloaded (probably in your Downloads folder) Click Open
Let’s see an example of a simple graph created with ggplot. We are going to use the mpg data set about different cars and their properties.
?mpg to learn more about this dataset. To run the code, highlight it and then click Run. (shortcut keys: Mac: command + Enter, Windows: CTRL + Enter)?mpg
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p
## 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p
## 4 audi a4 2.0 2008 4 auto(av) f 21 30 p
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p
## # ... with 1 more variables: class <chr>
The graph below uses ggplot2 to look for correlation between a car’s engine displacement and highway mileage.
library(ggplot2): loads the ggplot2 library
ggplot() : function that tells R that you want to make a graph with ggplot
data = mpg : says that you want to use the mpg dataset (sample data that comes with R)
geom_point(): function that says you want to make a scatterplot
mapping = aes(): function that allows you to map data variables to X and Y axes
**Run the following code in your script file:**
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
Make a scatterplot with cyl mapped to the x-axis and hwy mapped to the y-axis.
ggplot(data= mpg) + geom_point(mapping = aes(x=cyl, y=hwy))
Make a scatterplot of disp=x and hwy=y with class mapped to the color aesthetic. Run:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
The type of drive system the car has (4-wheel, rear-wheel, and front-wheel) is mapped to color.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv))
Variables can be mapped to the following aesthetic parameters. If you are publishing in b/w, and can’t use color, you might want to use size or shape:
colorsizeshapealpha - transparencySubstitute another aesthetic in place of color. Run the code:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv))
Facets are a way to create multiple smaller charts, or subplots, based on a variable. Run this code to see what faceting does:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
Substitute class for another variable in the dataset. Ex: trans, drive, or cyl
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
Facet grids allow for an extra dimension of faceting. Run this code in your script to see what facet_grid() does:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
Now create a new scatter plot with the dataset diamonds using ggplot2. Refer to previous code examples for assistance.
head(diamonds)
## # A tibble: 6 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price, color=cut)) + facet_wrap(~cut, nrow=2)